dropout rate
Synergizing Deconfounding and Temporal Generalization For Time-series Counterfactual Outcome Estimation
Liu, Yiling, Dong, Juncheng, Fu, Chen, Shi, Wei, Jiang, Ziyang, Hua, Zhigang, Carlson, David
Estimating counterfactual outcomes from time-series observations is crucial for effective decision-making, e.g. when to administer a life-saving treatment, yet remains significantly challenging because (i) the counterfactual trajectory is never observed and (ii) confounders evolve with time and distort estimation at every step. To address these challenges, we propose a novel framework that synergistically integrates two complementary approaches: Sub-treatment Group Alignment (SGA) and Random Temporal Masking (RTM). Instead of the coarse practice of aligning marginal distributions of the treatments in latent space, SGA uses iterative treatment-agnostic clustering to identify fine-grained sub-treatment groups. Aligning these fine-grained groups achieves improved distributional matching, thus leading to more effective deconfounding. We theoretically demonstrate that SGA optimizes a tighter upper bound on counterfactual risk and empirically verify its deconfounding efficacy. RTM promotes temporal generalization by randomly replacing input covariates with Gaussian noises during training. This encourages the model to rely less on potentially noisy or spuriously correlated covariates at the current step and more on stable historical patterns, thereby improving its ability to generalize across time and better preserve underlying causal relationships. Our experiments demonstrate that while applying SGA and RTM individually improves counterfactual outcome estimation, their synergistic combination consistently achieves state-of-the-art performance. This success comes from their distinct yet complementary roles: RTM enhances temporal generalization and robustness across time steps, while SGA improves deconfounding at each specific time point.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.67)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.92)
- Law (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Continuum Dropout for Neural Differential Equations
Lee, Jonghun, Oh, YongKyung, Kim, Sungil, Lim, Dong-Young
Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > South Korea > Ulsan > Ulsan (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
A Appendix
's are feedforward networks with three residual blocks, f We often place restrictions on the derivations to operationalize domain-specific constraints. 's are assigned to 0. In practice these constraints are implemented A.2 Lower Bound Derivation log p Due to memory constraints, in practice we use a batch size of 1 and simulate larger batch sizes through gradient accumulation. We observed training to be somewhat unstable and some datasets (e.g. For SCAN, all models and embeddings are 256-dimensional. We tune over the number of layers, hidden units, and dropout rate.
A T able of Notation Notation Description λ, w Hyperparameters and parameters L T, L
T able 2: A summary of notations used in this paper. In this section, we present the training algorithm for Self-Tuning Networks. Let A and B be square positive definite matrices. C.31, we get: r λ (λ) = null D.4 can be represented as: E The second term in Eqn D.15 is: E Therefore, first and second terms correspond to the first-and second-order Taylor approximations to the loss. In this section, we describe a structured best-response approximation for convolutional layers.
Supplementary Material SSAL: Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection Supplementary Material
In this supplementary material, following sections are discussed: we include training algorithm (Sec. In particular, we see a maximum decrease of 0.8% in mAP score when increasing Although we set both thresholds at 0.5, we find that our method is relatively robust to these hyperparameters. Tab. 4 reveals that a model trained on source domain (Sim10k [ Calibration is measured using ECE score. 2 Models ECE Score Source Only 0.25 Oracle 0.10 Detections missed by the EPM and found by our method are shown in Blue.
Supplementary Material
It is worth noting that, Eq. In Section 4.1, we have shown the experimental results of HPM on two population synthetic functions, It is worth noting that, since the synthetic function only simulates the validation loss function ( i.e., The same exploit strategy in PBT, i.e., truncation selection [ All the codes on the synthetic functions were implemented with Autograd. Same to the Figure 1 in Section 4.1, we show the mean performance We show the details of hyperparameters we tuned on the benchmark datasets as follows. The tied weight is used for the embedding and softmax layer.